A Study on Speaker Normalized MLP Features in LVCSR
نویسندگان
چکیده
Different normalization methods are applied in recent Large Vocabulary Continuous Speech Recognition Systems (LVCSR) to reduce the influence of speaker variability on the acoustic models. In this paper we investigate the use of Vocal Tract Length Normalization (VTLN) and Speaker Adaptive Training (SAT) in Multi Layer Perceptron (MLP) feature extraction on an English task. We achieve significant improvements by each normalization method and we gain further by stacking the normalizations. Studying features transformed by Constrained Maximum Likelihood Linear Regression (CMLLR) based SAT as possible input for MLP, further experiments show that MLP could not consistently take advantage of SAT as it does in case of VTLN.
منابع مشابه
On using MLP features in LVCSR
One of the major research thrusts in the speech group at ICSI is to use Multi-Layer Perceptron (MLP) based features in automatic speech recognition (ASR). This paper presents a study of three aspects of this effort: 1) the properties of the MLP features which make them useful, 2) incorporating MLP features together with PLP features in ASR, and 3) possible redundancy between MLP features and mo...
متن کاملContext-Dependent MLPs for LVCSR: TANDEM, Hybrid or Both?
Gaussian Mixture Model (GMM) and Multi Layer Perceptron (MLP) based acoustic models are compared on a French large vocabulary continuous speech recognition (LVCSR) task. In addition to optimizing the output layer size of the MLP, the effect of the deep neural network structure is also investigated. Moreover, using different linear transformations (time derivatives, LDA, CMLLR) on conventional M...
متن کامل(Deep) Neural Networks
This work continues in development of the recently proposed Bottle-Neck features for ASR. A five-layers MLP used in bottleneck feature extraction allows to obtai arbitrary feature size without dimensionality reduction by transforms, independently on the MLP training targets. The MLP topology – number and sizes of layers, suitable training targets, the impact of output feature transforms, the ne...
متن کاملSpeaker Recognition Via Nonlinear Discriminant Features
We use a multi-layer perceptron (MLP) to transform cepstral features into features better suited for speaker recognition. Two types of MLP output targets are considered: phones (Tandem/HATS-MLP) and speakers (Speaker-MLP). In the former case, output activations are used as features in a GMM speaker recognition system, while for the latter, hidden activations are used as features in an SVM syste...
متن کاملAnalysis and Comparison of Recent MLP Features for LVCSR Systems
MLP based front-ends have evolved in different ways in recent years beyond the seminal TANDEM-PLP features. This paper aims at providing a fair comparison of these recent progresses including the use of different long/short temporal inputs (PLP,MRASTA,wLP-TRAPS,DCT-TRAPS) and the use of complex architectures (bottleneck, hierarchy, multistream) that go beyond the conventional three layer MLP. F...
متن کامل